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Abstract 

We consider the following non-interactive simulation problem: Alice and Bob observe sequences W" and Y" 
respectively where {{Xi,Yi)}'l^i are drawn i.i.d. from P{x,y), and they output U and V respectively which is 
required to have a joint law that is close in total variation to a specified Q(u,v). It is known that the maximal 
correlation of U and V must necessarily be no bigger than that of X and Y if this is to be possible. Our main 
contribution is to bring hypercontractivity to bear as a tool on this problem. In particular, we show that if P{x, y) 
is the doubly symmetric binary source, then hypercontractivity provides stronger impossibility results than maximal 
correlation. Finally, we extend these tools to provide impossibility results for the fc-agent version of this problem. 


I. Introduction 

The problem of simulating random variables by two agents with suitable resource constraints has had a rich 
history leading to different formulations of this problem in the literature. The general setup for the problem is as 
follows: Two or more agents wish to simulate a specified joint distribution under resource constraints in the form of 
limited communication, limited common randomness provided to all of them, or limited correlation between their 
observations. One then wishes to find the minimum resources required to achieve the desired goal. 

The simulation problem has natural applications in numerous areas — from game-theoretic co-ordination in 
a network against an adversary to control of a dynamical system over a distributed network. These problems are 
expected to be important in many future technologies with remote-controlled applications, such as Amazon’s drone- 
based delivery system |[T] and robotic environmental cleanup, vegetation management, land clearing, and bio-mass 
harvesting 0. In these technologies, individual robotic components would need to take randomized actions under 
limited or no communication with other components or the central system. Study of the simulation problem can 
provide fundamental limits on the capabilities of such robotic components and guide efficient usage of the available 
resources. 

The earliest studied two-agent simulation problems were considered by Gacs and Korner Q, and Wyner Q. 
One may interpret their results, which we will describe shortly, in the framework of a generalization of both their 
problem setups as shown in Fig. [T] Let the random variables X, Y, U, V shown take values in finite sets. 

In this formulation, two agents each having access to its own infinite stream of private randomness, observe n 
i.i.d. copies of samples generated according to a specified law P{x,y) as shown, and are required to output nR 
samples drawn from a distribution that is close (in total variation) to the the distribution constructed by taking i.i.d. 
copies of a specified law Q{u, v). Let the simulation capacity R* be defined as the supremum of all rates for which 
given any e > 0, it is possible for some n to carry out this task to within total variation distance e. 

• When Q{u,v) is described by [/ = U ~ Ber(l/2), and P{x,y) is a general distribution, this problem 
considers fundamental limits for extracting common randomness from the distribution of {X,Y). Gacs and 
Komer showed in 0 that we have the simulation capacity R* = K{X-,Y), which has come to be known 
as the Gacs-Komer common information of X and Y. This quantity K{X\ Y) can be described as supiT(0) 
where 0 = f{X) — g{Y). In other words, the simulation capacity is non-zero only when the distribution 
of {X,Y) is decomposable, i.e. X may be partitioned as Xi U X 2 and y may be partitioned as jVi U 3^2 so 
that Pr {X e Ai, r e 3 ^ 2 ) = Pr (A G A 2 , F G 3^i) = 0 and Pr (A G Ai, F G 3^i), Pr (A G A 2 , F G 3 ^ 2 ) > 0. 
Further, they showed that in general, A(A;F) < /(A;F). 

• When P(x, y) is described by A = F ^ Ber(l/2), and Q{u, v) is a general distribution, this problem considers 
fundamental limits for common randomness needed for generating the random variable pair ([/, V). Wyner 

Part of this paper was presented at the 50th Annual Allerton Conference on Communications, Control and Computing 2012, Monticello, 
Illinois. This document is the final version of the paper to appear in the IEEE Transactions on Information Theory. 
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Fig. 1. A generalization of the problem setups considered by Gacs-Komer and Wyner 


showed in Q that the amount of common information needed for generation per sample is = C{U\ V), 

which has come to be known as the Wyner common information of U and V. This quantity C{U;V) can 
be described as sup/(0; U, V) over all 0 satisfying U — & — V with cardinality bound on the variable 0 
given by |0| < |if| • |V|. Further, Wyner showed that C{U;V) > I{U\V) in general. To be precise, Wyner 
considered a problem setting that required ([/, V) to be simulated with vanishing normalized relative entropy, 
i.e. if is the law of the simulated samples, and Q{u, v) was the target distribution, then simulation 

is considered possible in Wyner’s formulation if 

Yd ^ o. (D 

It has been recognized that the simulation capacity remains the same under the vanishing total variation 
constraint ||^ Lemma 5], |[^ Lemma IV. 1]. A recent work Q considers a variant of Wyner’s problem with 
exact generation of random variables as opposed to generation with a vanishing total variation distance. 

The problem of characterizing R* is open for general distributions P{x,y) and Q{u,v), and so is the problem 
of characterizing when R* > 0. 

In another stream of related work, the problem of simulation has been considered under rate-limited interaction 
between the agents. This began with the work of Cuff ||^ who studied communication requirements for simulating 
a channel with rate-limited communication and rate-limited common randomness. studied communication 
requirements for establishing dependence among nodes in a network setting. The former setup (of Cuff Q) was 
generalized by Gohari and Anantharam in (see Fig. |^. Two agents wish to simulate i.i.d. samples of a specified 
joint distribution P{x,y,u,v). Nature supplies i.i.d. copies of {X,Y) with the right marginal distribution as shown 
and the agents can use a certain rate of common randomness, certain rate-limited communication, and infinite 
streams of individual private randomness to accomplish the desired task. We want to understand the fundamental 
trade-offs between these rates to make this task possible. This problem was completely solved by Yassaee, Gohari, 
and Aref in d). However, this work does not address the problem of computing the simulation capacity R* for 
the setup in Fig. since the problem formulation there is different in two respects: In Fig. the task is to output 
n samples while in Fig. [T] the task is to output nR samples. Furthermore, even if R were say chosen to be 1, in 
Fig. 1^ the joint distribution of the quadruple (X", F”, C/”, V") is required to be close to i.i.d. copies of a specified 
joint distribution. However, in Fig. [1] the requirement is only on the marginal distribution of the output samples 
([/", V") and the quadruple (X'^, F", [/", F") need not even be close to an i.i.d. distribution. 

In this paper, we consider the former non-interactive simulation setup a la Gacs-Korner and Wyner (Fig. [2. Since 
the problem of characterizing whether R* > 0 for general distributions P{x,y) and Q(u,v), is also non-trivial, 
we propose a relaxed problem where two agents observe an arbitrary finite number of samples drawn i.i.d. from 
P(x,y} as shown in Fig. and are required to output one random variable each with the requirement that the 
output distribution be close in total variation to a specified Q(u,v). Clearly, if it is impossible to generate even a 
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Fig. 2. Generalization of Cuff’s formulation by Gohari and Anantharam |10| 


single sample, we must have R* = 0. We therefore focus on impossibility results for this problem which will be 
relevant to the formulation in Fig. [1] It is not clear if the converse is true, i.e. it is unclear whether the feasibility 
of generating one sample asymptotically implies that we may generate samples at a rate R > 0. 

Note that the notion of simulation we consider is distinct from the notion of exact generation wherein a certain 
distribution is required to be generated exactly. If we have a strategic setting, such as a distributed game, in which a 
player, represented by a number of distributed agents, is playing against an adversary, the agents would often need 
to generate a joint distribution exactly to avoid providing unforeseen strategic advantages to the adversary. 




Private 

Randomness 

1 


Alice 


u 


Y 



Randomness 




Fig. 3. The non-interactive simulation problem considered in this paper 


When ([/, V) ^ Q{u, v) is described hy U = V ^ Ber(l/2) while P{x, y) is a general distribution, the problem 
has recently come to be called non-interactive correlation distillation in), ig. We therefore, call our formulation 
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the problem of non-interactive simulation of joint distributions. In a remarkable strengthening of the Gacs-Komer 
result 1^, Witsenhausen showed in p5) that unless the Gacs-Korner common information K{X]Y) is positive (i.e. 
the joint distribution of {X,Y) is decomposable), non-interactive correlation distillation is impossible to achieve. 
The chief tool used in Witsenhausen’s proof is the maximal correlation of two random variables, a quantity which 
will be of prime importance in the present paper as well. 

The second tool that we will be using is hypercontractivity, which has found numerous applications in mathemat¬ 
ics, physics, and theoretical computer science. The origins of hypercontractivity lie in the early works of Bonami 
|[^, |Tg, of Nelson |T^ in quantum field theory, of Gross | [T9) who first developed the connection to logarithmic 
Sobolev inequalities, and of Beckner pO) . The meaning of hypercontractivity was broadened by Borell pT[ to what 
is sometimes called reverse hypercontractivity today | |22) . Hypercontractivity has found powerful applications in a 
lot of helds, for example the study of influence of variables on Boolean functions pT) , p4) , p5] and in voting 
system theory p6) . Ahlswede and Gacs 0 identihed the use of hypercontractivity in studying the spreading of sets 
in high dimensional product spaces. In recent works, p8) showed an equivalence between hypercontractivity and 
strong data processing inequalities for Renyi divergences, p9) used hypercontractivity to show non-vanishing lower 
bounds on hypothesis testing, ig studied hypercontractivity for a noise operator that computed spherical averages 
in Hamming space, m showed a connection between hypercontractivity and strong data processing inequalities for 
mutual information, and used hypercontractivity to study the mutual information between Boolean functions. 
As we shall see, hypercontractivity has properties that make it naturally well-suited for studying the non-interactive 
simulation problem. 

Let us formally set up the non-interactive simulation problem described earlier. 

Definition 1. Let X,y,U,V denote finite sets. Given a source distribution P{x,y) over X x y and a target 
distribution Q(u,v) over if x V, we say that non-interactive simulation of Q{u,v) using P(x,y) is possible, if for 
any e > 0, there exists a positive integer n, a finite set TZ, and functions / : A"” x TZ U, g : 3^" x TZ V such 
that 

dTv ((/(X", Mx),g{Y^. My))-, {U, V)) < e 

where {(ATi,is a sequence of i.i.d. samples drawn from P{x,y), Mx,My are uniformly distributed in TZ 
and are mutually independent of each other and the samples from the source, {U,V) is drawn from Q{u,v) and 
dTv {-; ■) is the total variation distance (defined as half the Li distance between the distributions). 

For a fixed P{x,y), the set of distributions Q{u,v) on a fixed set U x V for which non-interactive simulation 
is possible is precisely the closure of the set of marginal distributions of {U, V) satisfying U — X^ — Y^ — V for 
some k. However, this set of distributions appears to be very hard to characterize explicitly. In this paper, we focus 
on outer bounds on this set, or in other words impossibility results for non-interactive simulation. 

Note that since we are interested only in determining the possibility of simulation and not in the simulation 
capacity, the problem does not have any less generality if we disallow the agents from using any private randomness, 
since agents can obtain as much private randomness as desired by using extended observations that are non¬ 
overlapping in time, i.e. the agents observe ni-|-n 2 +n 3 symbols, they use (Xi,..., Xnf), {Yi, ..., respectively 
as their correlated observations, Alice uses Xm+i , ■ ■ ■, as her private randomness, and Bob uses Yn 2 +i , ■ ■ ■, Yn^ 
as his private randomness. We make the choice to assume the availability of private randomness as part of the model. 

We will consider two examples to motivate the focus of this study. 

A. Example 1 

Let X be a uniform Bernoulli random variable, X ~ Ber(|). Let F be a noisy copy of X, i.e. Y = X N 
where N ~ Ber(a) for 0 < a < is independent of X. Here, the addition is modulo 2. We say that (X,F) 
has the doubly symmetric binary source distribution with parameter a, denoted DSBS(a) following the notation 
of Wyner Q. We consider ([/, V) ^ DSBS(/3) for 0 < P < \. We may ask whether non-interactive simulation of 
Q{u,v) = DSBS(/3) using P{x,y) = DSBS(a) is possible. Witsenhausen answered this question in the negative 
when f3 < a in eg, thus significantly strengthening the result of Gacs and Korner Q. Witsenhausen established this 
by proving the tensorization of the maximal correlation of an arbitrary pair of random variables (both tensorization 
and maximal correlation are defined and discussed in Section |II-A[ ). This can be used to conclude that if non¬ 
interactive simulation is possible, then the maximal correlation of the target distribution can be no more than that 
of the source distribution. The parameter n has disappeared in this comparison thanks to the tensorization property. 
The maximal correlation of a pair of binary random variables distributed as DSBS(a) equals |1 — 2 q!|. Thus, for 
instance, if the non-interactive simulation of DSBS(/3) using DSBS(a) is possible, with 0 < a, (3 < then we 
must have a < (3. Furthermore, it is easy to see that if a < (3, then non-interactive simulation is indeed possible; 
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Alice outputs the first bit of her observation while Bob outputs a suitable noisy copy of his first bit. Thus, for 
0 < a,/3 < non-interactive simulation of DSBS(/3) using DSBS(q!) is possible if and only if a < (3. 


B. Example 2 

Let P{x,y) be given by {X,Y) ^ DSBS(a) with 0 < a < ^. Consider binary random variables {U,V) 
distributed as Q{u, v) given by: Q(0,0) = 0, <5(0,1) = Q(l, 0) = <5(1,1) = |. We ask if non-interactive simulation 
of Q{u, v) using DSBS(a) is possible. The maximal correlation of a DSBS(a) source distribution is |1 — 2 q!| while 
that of Q{u,v) is j. Since non-interactive simulation is impossible unless the maximal correlation of the source 
exceeds that of the target, we have non-interactive simulation impossible if |1 — 2 q!| < i.e. | < a < But what 
about the case when 0 < a < ^7 Can we come up with a suitable scheme to simulate Q{u,v)7 The answer turns 
out to be no for each 0 < a < | and can be proved using the following inequality which holds for {{Xi, 
being i.i.d. DSBS(a), and for arbitrary sets S', T C {0,1}” : 

Pr(A:" gS,Y^ eT)> Pr{X^ G S)^ Pr(r" G T)^ . (2) 


The above inequality follows from a so-called reverse hypercontractive inequality p3] Thm. 3.4]. We will revisit 
this inequality in Section II-C If non-interactive simulation of Q{u,v) using DSBS(a) were possible, we should 
be able to find sets S, T such that Pr (AT” G S) « Pt (^" G T) « | and Pr (X^ G S, y” G T) « 0. Inequality 
© rules out this possibility (assuming private randomness is not available, which we had argued is without loss 
of generality). Thus, hypercontractivity or reverse hypercontractivity can provide impossibility results when the 
maximal correlation approach cannot. Is it true that one is always stronger than the other? One of the main results 
in our paper is that hypercontractivity allows for stronger impossibility results than the maximal correlation when 
P{x,y) = DSBS(a). More generally, we give necessary and sufficient conditions on P{x,y) for this subsumption. 
This arises from an inequality obtained by Ahlswede and Gacs 1271 in the hypercontractive case which we extend 
to the reverse hypercontractive case. 

The rest of the paper is organized as follows. Section O discusses preliminaries on maximal correlation and 
hypercontractivity. We present our main results in Section ]lll| As mentioned earlier, one of our main results is a 
necessary and sufficient condition on the source distribution P{x, y) which allows one to definitively conclude that 
hypercontractivity will provide stronger impossibility results than maximal correlation. As our second main result, 
we give a characterization of a limiting hypercontractivity parameter (that we call s*) as a strong data processing 
constant for KL divergences. This characterization was first proven by Ahlswede-Gacs p7) . However, our proof 
has the advantage of being more intuitive - arising naturally from a Taylor series expansion - while at the same time 
extending immediately to reverse hypercontractivity. This hypercontractivity parameter has recently been shown to 
also be the tightest constant in strong data processing inequalities for mutual information 0. Section [I^ discusses 
the extension of the non-interactive simulation problem for k > 3 agents. We provide a couple of interesting three- 
user non-interactive simulation examples where every two agents can simulate the corresponding pairwise marginal 
of the desired joint distribution but the triple cannot simulate the triple joint distribution. 


II. Main Tools: Maximal Correlation and Hypercontractivity 
In this paper, all sets are finite and all probability distributions are discrete and have finite support. We denote 
the marginals of P{x, y) and Q{u, v) by Px{x), Pyiy) and Qu{u), Qv(,v) respectively. We will use K>o and K>o 
to denote non-negative reals and strictly positive reals respectively. In the following subsections, we will review the 
definition and properties of maximal correlation and hypercontractivity. 


A. Maximal Correlation 

For jointly distributed random variables (X,y), define their maximal correlation pm{X;Y) := supE/(X)(7(F) 
where the supremum is taken over f : X i-G R, g : y i-G R such that E/(X) = E( 7 (F) = 0 and E/jAT)^, Ep(y)^ < 

1 . 

Example 1. If {X,Y) ~ DSBS(a), then the only functions f,g satisfying the conditions E/jAT) = Ep(y) = 0 
and E,f{X)‘^,'Eg{Y)'^ < 1 are f{x) = a{lx=o — lx=i) and g{y) = &(ly=o ~ ly=i) with |a|, |6| < 1. The optimum 
is then achieved with a = b = l if a<^ and with a = b = —1 if a > Thus, 

Pm{X-Y) = \l-2a\. (3) 

The following properties of the maximal correlation of two discrete random variables with finite support can be 
shown easily p3). 
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1) 0 <p^(X;r)<l. 

2) pm{X;Y) = 0 if and only if X is independent of Y. 

3) Pm{X]Y) = 1 if and only if the Gacs-Korner common information K{X;Y) > 0, i.e. if and only if {X,Y) 
is decomposable. 

The three key properties of maximal correlation that are useful for the non-interactive simulation problem are as 
follows: 

• (data processing inequality) For any functions </>,'!/') Pm{X;Y) > pm{4>(.X),'ip{Y)). 

. (tensorization) If (Xi, Yi), (X 2 , Y 2 ) are independent, then X 2 ; ^ 1 ,^ 2 ) = max{pm(2fi; Yi), Pm{X 2 ; Y 2 )} 

11^ Thm. 1], 

• (lower semi-continuity) (Recall that if is a metric space, u is a point in U and f :U 1 -^ M. is a real-valued 
function, then we say / is lower semi-continuous at u if —>■ u implies liminf„ f{un) > f{u).) If the space 
of probability distributions on Y x 3^ is endowed with the total variation distance metric, then pm{X\Y) is a 
lower semi-continuous function of the joint distribution P{x,y). [An example will be provided to show that 
Pm is not a continuous function of the joint distribution.] 

To keep the paper self-contained, proofs of these properties are sketched in Appendix Now, using the above 
three properties, maximal correlation can be used to prove impossibility results for the non-interactive simulation 
problem. 


Observation 1. Non-interactive simulation of (U,V) ~ Q{u,v) using (X, Y) ^ P{x,y) is 
possible only if pm{X] Y) > pm(U\V). 

Proof Suppose non-interactive simulation of {U,V) ~ Q{u,v) using (X, Y) ~ P{x,y) is possible. This means, 
there exists a sequence of integers (fc„ : n > 1), a sequence of finite alphabets TZn, and a sequence of functions 
fn : X TZn bf gn ■ 3^^" X Tbn >—>■ V, such that if {Xi,Yi}^f!i are drawn i.i.d. P{x,y) and Mx,My are 

uniformly distributed in TZn, with {Xi,Yi}'^f^,Mx,MY mutually independent, and (7„ = /„(X^", Mx), = 


Afy), then c?Tv((C^n, Vn); {U, V)) —0 as n —oo. We therefore, have 

PmiUn, Vn) < Pm{x'^", Mx]Y'^'^ , My) (Data Processing Inequality) (4) 

= max{pm(Xi; Yi), pm(X 2 ; Y 2 ),..., p„^(Xfc„, Yfc„), Pm{Mx; My)} (Tensorization) (5) 
= max{p^(Xi; Yi),0} (6) 

= Pm{X-Y) (7) 

By lower semi-continuity of pm, dTx{{Un, Y„); {U, V)) —>■ 0 implies 

Pm{U-,V) < liminf pm(C/„; Y„) < pm{X-,Y). 

n—foo 

□ 


B. Hypercontractivity 

Definition 2. For any real-valued random variable W with finite support, and any real number p, define 


IIw^IIp 


(E|fYn'/^ pfo-, 
exp (Elog |1Y|) p = 0, 


with the understanding that for p < 0, ||lY||p = 0 if Pr (|1Y| = 0) > 0. 


( 8 ) 


||lY||p is continuous and non-decreasing in p. If W is not almost surely a constant, then ||lY||p is strictly 
increasing for p > 0. If in addition, Pr (|1Y| = 0) = 0, then ||lY||p is strictly increasing for all p. 

Definition 3. For any real p 7 ^ 0,1, define its Holder conjugate p' by ^ -f ^ = 1. For p = 0, define p' = 0. 

Suppose X, Y are real-valued random variables with finite support. We write X > 0 if Pr (X > 0) = 1. The 
following are well-known | |34t : 

• (Minkowski’s inequality) For p > 1, ||X -f Y||p < ||X||p + ||Y||p. 

• (Reverse Minkowski’s inequality) For p < 1 and X, Y > 0, ||X -f Y||p > ||X||p + ||Y||p. 


6 


• (Holder’s inequality) For p> 1, E[Xy] < ||X||p/||y||p. 

• (Reverse Holder’s inequality) For p <1 and X,Y >0, E[XF] > ||X||p'||y||p. 

Definition 4. For a pair of random variables {X, Y) ~ P{x, y) on X x 3^, we say (X, Y) is {p, q)-hypercontractive 
if 

• 1 < 9 < P, and 

l|E[p(r)|x]||p< ||p(y)||, yg-.y^R; (9) 

(If h{Y) = |p(y)|, then —E[/i(F)|X] < E[p(y)|X] < E[/i(F)|X] pointwise, thus we may equivalently restrict 
g to map to IR>o. If Wn supported on at most k values (for some fixed k) converges to W in distribution, 
then ||IF„||p —>■ ||H^||p for any p, so we may further equivalently restrict g to map to M>o-) 

• 1 > 9 > P, and 

||E[ 5 (y)|X]||p> ||s(r)||, V9:3 ^^R>o. (10) 

(If Wn supported on at most k values (for some fixed k) converges to W in distribution, then ||fFrt| |p —> I IH^I Ip 
for any p, so we may equivalently restrict g to map to K>o-) 

Note that in the conventional definitions in (|^ and ( flOl i, we have functions taking values in M and IR>o respectively. 
As explained above, for ([^, we may restrict to functions taking values in K>o. However, in ( [TOl i, the functions must 
take non-negative values. This is conventional and necessary in various“reverse” inequalities such as the reverse 
Minkowski and reverse Holder inequalities. 

Define the hypercontmctivity ribbon TZ{X\Y) as the set of pairs (p, q) for which {X, Y) is (p, 5 )-hypercontractive. 

It is easy to check that the inequalities ([^, ( [TOl i always hold for p = q. The conditional expectation operator 
is thus always contractive when p > 1, and reverse contractive for positive-valued functions when p < 1. For 
random variables {X,Y) with a specific distribution P{x,y), the operator may be hypercontractive (i.e. more than 
contractive) in this precise sense. TZ{X\Y) is a region in pinching to a point at (1,1) resembling a ribbon, 
explaining our choice of the name (see Fig. |^. Inequality ( [T0| ) is also referred to as reverse hypercontractivity in 
the literature 122) . 



Fig. 4. The hypercontractivity rihbon R(X; Y) is the shaded region. Also shown a straight line of slope := p^(X; Y) through (1,1) 
(from Thm. 

1) Interpretation of hypercontractivity as Holder-contractivity: It is well-known | |22) that an equivalent definition 
of Tl{X;Y) can be given by observing how much the corresponding Holder’s and reverse Holder’s inequalities 
may be tightened; 

• For l<g<p, l<pwe have (p, q) G IR(Ar; Y) iff 

Ef{X)giY)<\\fiX)\\p,MY)\\, yf:Xv^R,g:y^R; (11) 

• For l>g>p, l>pwe have (p, q) G M(Ar; Y) iff 

Ef{X)g{Y) > \\f{X)\\p,\\g{Y)\\g V/ : A" ^ M>o, 9 : 3^ ^ K>o; (12) 

We will refer to inequalities ([TT), ([T^ as Holder-contractive inequalities since they tighten Holder’s inequality 
(using the knowledge that X and Y are not ‘too correlated’ in a suitable sense). 
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To see the equivalence for l> 9 >p, l>p observe that if ([T 0 |l holds for any strictly positive-valued function 


g, then for any fixed strictly positive-valued function /, we have 

EfiX)g{Y) = E [fiX)E[giY)\X]] (13) 

> ||/(X)||p/||E[ 5 ((y)|X]||p (Reverse Holder’s inequality and E[( 7 (y)|X] > 0) (14) 

>\\f{X)\\MY)\U. (15) 

Conversely, suppose (fTZli holds for any strictly positive-valued functions f, q. First assume p 7 ^ 0. By fixing q and 
choosing f{X) = E[5(r)|X]P-i, we get 

E [E[ 5 (r)|X]P] = E [E[g{Y)\X]P-^g{Y)] (16) 

> ||E[5(r)|Xf-i||p,115(1^)11, (17) 

= (E[E[g(y)|Xf])i-i||p(y)||,. (18) 

Since E[p(r)|X] > 0, we obtain ||E[g(y)|X]||p > || 5 (F)||g. 

Now, consider the case p = 0. If ([T^ holds for any strictly positive-valued functions /, g with p = p' = 0, then 
by monotonicity of || • ||r in r, we also have 

Ef{X)g{Y) > \\f{X)\\_,\\g{Y)\\g Vf : X ^ M>o,5 : 3^ ^ K>o; (19) 


By our previous argument, this gives ||E[( 7 (y)|Xl||_^ > ||q(F)|L. Since this holds for each e > 0, we get from 
continuity of || • ||p in p that ||E[p(y)|X]||o > ||p(l^)||g. 

The equivalence for the case l<q<p, l<pis similar. We only need to note that for {X,Y) to be {p,q)- 
hypercontractive with 1 < q < p, it suffices to have ||E[p(y)|X]||p < ||q(T")||^ hold only for all strictly positive 
functions q > 0. The rest of the proof is identical. 

2) Duality between TZ{X\Y) and TZ{Y;X): The equivalent description of TZ{X;Y) in ( [TT] i, ([T^ immediately 
gives the following duality between Tl(X;Y) and Tl{Y;X): 


(p,q) e 7^(X;F) O (q',p') G n{Y;X), p,q^l. 


( 20 ) 


Tl(X;Y) is completely specified by its non-trivial boundary q*{X;Y) defined for p 7 ^ 1 as 

115(^11, yg-.y-^E} P>1; 

1sup{q<l:||E[q(y)|X]||p>||q(y)||, Vq:3;^K>o} p < 1. 


( 21 ) 


We will find it useful to define the ‘slope at p’ by Sp{X;Y) := ^ for p 7 ^ 1. 

The following properties may be easily shown. 

1) 0 < Sp{X-Y) < 1. 

2) Sp{X;Y) = 0 if and only if X is independent of Y. [This is a consequence of Thm. [^and the corresponding 
property for pm{X;Y).] 

One can show that for any p 7 ^ 1, Sp{X;Y) satisfies the same three key properties that maximal correlation 
satisfies (proofs of these properties are sketched in Appendix [B|). 


• (data processing inequality) For any functions </>,'!/') Sp(2f;T") > Sp{(j>{X),'tp(Y)). 

• (tensorization) If (Xi, Yi), (X 2 , Y 2 ) independent, then Sp(Xi, X 2 ; Yi, Y 2 ) = max{Sp(Xi; Yi), Sp{X 2 ', Y 2 )} 

|T5). 

• (lower semi-continuity) If the space of probability distributions on Y x is endowed with the total variation 
distance metric, then Sp{X;Y) is a lower semi-continuous function of the joint distribution P{x,y). [An 
example will be provided to show that Sp is not a continuous function of the joint distribution.] 


Thus, we can use hypercontractivity to obtain impossibility results for the non-interactive simulation problem. 


Observation 2. Non-interactive simulation of (U,V) ~ Q{u,v) using (X, Y) ^ P{x,y) is 
possible only if Sp{X;Y) > Sp{U]V) for each p f in other words, only if TZ{X\Y) C 

n{u-,v). 

Example 2. A classical result states that for (X, Y) ^ DSBS(a), 
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q;{X;Y)-l 

p-1 


= Sp{X;Y) = {l-2af, p ^ 1. 


( 22 ) 


This was proved by Bonami GZ) and Beckner pO] Lemma 1, Appendix Sec. 2] for p > 1 and by Borell | |^ 
Thm 3.2] for p < 1. 


C. Proving impossibility results for non-interactive simulation using the hypercontractivity ribbon TZ{X;Y) 

In this subsection, we state explicitly a simple observation that is well-known. Suppose non-interactive simulation 
of {U,V) ^ Q{u,v) using {X,Y) ~ P{x,y) is possible. This means, there exists a sequence of integers 
{kn : n > 1 ), a sequence of finite alphabets TZn, and a sequence of functions /„ : x TZn lA, pn ■ 

yk„ ^ {Xi,Yi}^f^ are drawn i.i.d. P{x,y) and Mx,My are uniformly distributed 

in Tin, with {Xi,Yi}'lt^^,Mx,MY mutually independent, and [/„ = Mx), 14, = g-aiY^'^ ,My), then 

dTviiUn,Vn); {U, y)) -> 0 as rn> oo. Let (i7„, 14) -- Qn{u,v). 

A traditional approach to prove impossibility results for non-interactive simulation is as follows. Fix n. Suppose 
{X,Y) is (p, ( 3 ')-hypercontractive with 1 < q < p. Then, by tensorization {{X^^, Mx), iY^'^, My)) is {p,q)- 
hy percontractive. 

Consider the functions defined as: 


By using CD, we get 


fnix ",mx) — 'y ) ) 

(23) 

u&A 


'fn{y ,‘kkty) Pv^[gn{y^'^ ,my)—v]' 

(24) 

veV 


^x)f{Y^YMY) < mx'^YMx)\\p'mY''YMY)\U, 

(25) 


which is 


uGlt vCiV 



By letting n ^ oo, we get 


EE 

uCiU vCiV 



(26) 


(27) 


For any fixed we find that non-interactive simulation of {U,V) ^ Q{u,v) from {X,Y) ^ P{x,y) is 

possible only if Q satisfies the inequality 0- 

Similarly, if {X,Y) is (p, q)-hypercontractive with 1 > q > p then, for any fixed Xu, Pv > 0, non-interactive 
simulation of {U,V) ~ Q{u,v) from {X,Y) ~ P{x,y) is possible only if Q satisfies the following inequality: 

i/p' 


Indeed, (|^ is a version of ( |28l ). Let {X,Y) ~ DSBS(a). Then, {X,Y) is (—2a)-hypercontractive from 
( |22| l. Choosing Aq = po = 1) = Mi = e with e —0, we obtain Q where U — V = {Q, 1}. 

The inclusion Ti{X-,Y) C 72.(17; V) implies the collection of inequalities ( |27| ) for any choice of real {Xu}u£U, {Pv}v(^V 
and the collection of inequalities ( |28l l for any choice of positive valued {Xu}u&u,{pv}v£V- One can also easily 
show that the reverse implication from the collection of inequalities ( |27| ), ( [28l l to Ti{X-,Y) C 72(17; 17) holds (using 
the equivalent interpretation of hypercontractivity as Holder-contractivity). 

Thus, Ti{X;Y) C 72(17; 17) is powerful enough to subsume the application of all possible instantiations of A„, p,y 
in the corresponding Holder-contractive inequalities. 

The reader should note the importance of the above observation in the context of thinking abstractly about 
the hypercontractivity ribbon and its usefulness when invoking an automated computer search for proving an 




uGV 


P-lQiv) 


(28) 


EE XuP.vQ{u,v) > 


u^U vGV 


\uGU 
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impossibility of non-interactive simulation result. If non-interactive simulation of {U,V) using {X,Y) is possible, 
then any Holder-contractive inequality satisfied by {X,Y) will also be satisfied by {U,V). Therefore, if any such 
inequality satished by all functions of X and Y is violated by some pair of functions of U and V, then we can 
conlude non-simulability, i.e. that simulation of (?7, V) using (X, Y) is impossible. However, violation of any 
such Holder-contractive inequality implies failure of the inclusion TZ{X\Y) C TZ{U;V), so one can get the same 
conclusion from the result that failure of the inclusion TZ{X;Y) C TZ{U;V) implies non-simulability. Further, it 
is easier to show failure of inclusion of the hypercontractivity ribbons than it is to show violation of any specihc 
such Holder-contractive inequality, simply because violation of any Holder-contractive inequality implies failure of 
inclusion of the hypercontractivity ribbons but failure of inclusion of the hypercontractivity ribbons just implies 
that some Holder-contractive inequality is violated. Thus, if one wishes to show non-simulability using a computer 
search, it suffices to compute the non-trivial boundaries of the two hypercontractivity ribbons q*{X-,Y) and q*{U\V) 
(and the corresponding Sp{X] Y) and Sp{U ; V)) and hnd that Sp{X\Y) < Sp{U; V) for some p ^ 1 without ever 
having to prove for some specihc Holder-contractive inequality that it is the one being violated. 

To the best of our knowledge, there is no algorithm better than a brute force search following suitable discretization 
to compute the hypercontractivity ribbons. However, the observation above simplihes the approach of proving an 
impossibility result using instantiations of \u and py. 

III. Main Results 

In this section, we state and prove our main results. 

A. Connection between maximal correlation and the hypercontractivity ribbon 

Our hrst result is a geometric connection between maximal coiTelation and the hypercontractivity ribbon. 
Theorem 1. If {X,Y) is (p, q)-hypercontractive and p f 1, then 

pI{X;Y)<^. (29) 

Remark 1. For the case p > 1, Thm. [1] is obtained in p7) . In the current form of the statement of Thm. 
the maximal correlation is afforded a geometric meaning, namely its square is the slope of a straight line bound 
constraining the hypercontractivity ribbon (see Fig |^. For (X,Y) ~ DSBS(a), we have from and ( |22| ) that 
the hypercontractivity ribbon IZ{X\ Y) is precisely the wedge obtained by the straight lines p = q, and the straight 
line corresponding to the maximal correlation bound 

Proof of Theorem^ The proof uses a perturbative argument. Let {X,Y) ^ P{x,y). The claim is obvious when 
either 2f or F is a constant almost surely. So, assume this is not the case and hx functions 
such that 

Ef{X) = E'fiY) = 0, EfiXf = EV'(F)2 = 1. (30) 

Fix r > 0. Dehne f : X R> 0 ; <7 : y K>o by f{x) = 1-1- = 1 + arfiy). Note that for sufficiently 

small cr, the functions f,g do take only positive values. Fix {p,q) G with p < 1. We also assume p 0 

using the standard limit argument to deal with the case p = 0. Using ( |T2l i with the functions /, g we just dehned, 
we have 

E[(l + ^f(X))(l + ar^(Y))] > (e[(1 + ■ (E[(l + ar^l,{Y)Y]f^ . (31) 

For Z satisfying EZ = Q,EZ'^ = 1, 

(E[(l -f = (^l + l- aEZ + ~ • a^EZ^ -p 0(0^)^ ^ 

= 1 - 1 - + 0{a^). 

Using this in ( |3T] i, we get 

1 + a^ElfmiY)] > (^1 + + Oia^)^ (^1 + + 0 (^ 3 )^ . 
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Comparing the coefficient of on both sides, we get 

mxmy) > ^ 

Noting that p' — l,g- 1 < 0 and taking the supremum over all r > 0, we get 

E</.(X)r/>(y) >or - E</)(X)V^(y) < (32) 

Taking the supremum over all —(j) and ijj satisfying we get 

Pm{X;Y) < y^^33;■ 

We can similarly prove the inequality in the case when p > 1. This completes the proof. □ 

The main implication of Thm. [T] for the problem of non-interactive simulation is the following corollary, which 
gives a necessary and sufficient condition on the source distribution P{x,y) for which Observation will prove 
impossibility results that are at least as strong as Observation [T] This condition is satisfied for example, when 
P{x,y) is a DSBS(e) distribution. 


Corollary 1. Fix a distribution {X,Y) ^ P{x,y). Then the following are equivalent: 
(a) For all {U,V) ^ Q{u,v),n{X-Y) (zn{U-V) Pm{X;Y) > p^{U;V). 


(b) 


Pm{X;Y) 


inf J- - 

{p,q)en{x-Y),P:^i M p-1 


(33) 


Proof of Corollary^ (b) (a): Assume (b) holds for P(a;, y). If 7?.(X; y) C 72.(C/; C), then inf(pq)g 7 ^(X;y),p 5 ^i 

inf(p,g)G7?,(c/;y),p#i \J~^- Now, by hypothesis, inf(p^g)gK(jC;y),p5^r = Pm{X;Y) and from Thm. we have 

inf(p_q)g7^(c/;r/)^p^i —p > Pm{U] V\ 

^(b) ~(a): Suppose that for {X,Y) ^ P{x,y), we have for some <5 0, 


PmiX;Y) = 


inf 


9-1 


-5. 


By Theorem i5 > 0. From 


{p,q)Gn{X-,Y),p^l y p-1 

', we know that if {U, V) ^ DSBS(e), then for any p 1, 
Q;iU;V)-l 


p-1 


= {l-2ey = PmiU; y)". 


Choosing e so that pm{U]V) = 1 - 2e = inf(p,g)e 7 ?,(Jf;F),p 5 ^r y we have pm{X-,Y) < pm{U\V) and 
n{X;Y) C n{U]V). 

□ 


B. Limiting chordal slope of the hypercontractivity ribbon 

Our second result proves the existence of limp_i.i Sp{X\ Y) and provides a characterization of the limit in terms 
of a strong data processing constant for relative entropies that was studied first in p7| . 

Definition 5. Let D{p,{z)\\v{z)) = Yhz P^) denote the relative entropy of p, with respect to v. Consider 

finite sets X and y, and let P{x,y) be a joint distribution over the product set X xy. Let Rx{x) be an arbitrary 
probability distribution on X. Let RY{y) be the probability distribution on y whose probability mass at y is 
(y,y) ~ Px{x,y), then define the strong data processing constant for relative entropies 
corresponding to {X,Y) as 

D{RY(,y)\\PY{y)) 

s {X;Y) := sup , 

D{Rx(x)\\Px{x)) 

where the supremum is taken over all Rxix) satisfying Rx{x) ^ Px{x) and Rx{x) « Px{x). 


11 

















Remark 2. In a recent work 0, it is shown that s* is also the tightest constant for data processing inequalities 
involving mutual information in Markov chains: 


s*{X-,Y) 


I{U;Y) 
U:U-X-Y I{U ; X) 


Our result can be stated as follows. 


Theorem 2. 

G* (X‘ 1^') 1 

lim s„(X; Y) = lim i -= s*(Y; X). (34) 

p->l p->-l P — r 

The proof of Thm. follows from a natural Taylor series calculation, and can be found in Appendix The 
following corollary shows that limp_>.oo Sp{X;Y) = limp_>._oo Sp{X-, Y) = s*(2f; Y). The former was established 
in | [27) while the latter result is new. We believe that using Theorems [T] and we acquire a more intuitive proof of 
the result limp_>oo Sp(X; Y) = s*(Ar; F) that was obtained in p7) , while also showing the reverse hypercontractive 
case: limp_>_oo Sp(A:; F) = s*(2f;F) 


Corollary 2. 


q;(X;Y)-l 


q;(X-,Y)-l 


lim : - = lim ~ =5*(X;F). 


p—^oo 


p-1 


p—^ — oo 


p-1 


(35) 


The proof of Corollary is in Appendix Corollary which follows immediately from Corollary Thm. 
and Corollary provides a sufficient condition for ( [33| to hold. 

Corollary 3. If pmiX;Y) = min{-\/s*(F;F), a/s*(F;F)}, then for any (U,V) ^ Q{u,v), we have 

niX;Y) CTZ{U;V) F) > F). 


Note that from ([^, ( |2^ and Thm.|^ DSBS sources always satisfy the condition in Corollary One can also show 
that the condition holds for source distributions corresponding to the input-output pair resulting from a uniformly 
distributed input into a binary input symmetric output channel. The above ideas suggest that for a recent conjecture 
regarding Boolean functions p5j , hypercontractivity is going to be a more useful tool than maximal correlation. 
Indeed, evidence for this can be found in where usage of s* helps in an automated proof of an inequality that 
cannot be proved using maximal correlation. 

Example 3. Suppose we choose P(x,y) to be DSBS(e), and Q(u,v) specihed by Q{U = 1) = s, Q{V = 1\U = 
0) = c,Q{V = 0\U = 1) = d. For certain values of s,c,d, non-interactive simulation is possible and for others, it 
is impossible. For hxed values of s, this is shown graphically in Fig. 

IV. NON-INTERACTIVE simulation with fc > 3 AGENTS 
The non-interactive simulation problem we have considered can be naturally extended to fc-agents. 

Definition 6. Let Xi,Ui denote hnite sets for * = 1,2, ...,A:. Given a source distribution P{xi,X 2 , ■ ■ ■ ,Xk) 
over and a target distribution Q{ul,U 2 ^ ■ ■ ■ ,Uk) over we say that non-interactive simulation of 

Q{ui,U 2 , ■ • ■, Uk) using P{xi,X 2 , • ■ •, Xk) is possible if for any e > 0, there exists a positive integer n, a hnite set 
TZ and functions fi : Xf xTZt-^Ui for i = 1, 2,..., fc such that 

dTV ((/iW, Ml), f 2 {X^, M 2 ),fkiXj:, Mk)y, (C/i, 1/2,..., C/fc)) < e 

where {{Xij, X 2 J ,..., Xkj)}'j^i is a sequence of i.i.d. samples drawn from P{xi,X 2 , ■.. ,Xk), Mi, M 2 ,..., Mk 
are uniformly distributed in TZ, mutually independent of each other and of the samples drawn from the source, 
{Ui, C/ 2 ,..., Uk) is drawn from Q{ui,U 2 , ■ ■ ■, Uk), and (Ctv)’ ; ■) is the total variation distance. 

In this section, we make simple observations about how hypercontractivity and maximal correlation may be used to 
prove impossibility results for this non-interactive simulation problem with k agents. For any set A C {1, 2,..., fc}, 
let us use the notation Xa ■= {Xi : i G A), Ua ■= (Ui : i G A). 

Recall that for the case of two random variables (AT, F) and 1 < g < p, we have (p, q) G TZ(X; Y) if either of 
the two following equivalent conditions hold: 
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(a) s = 0.3, e = 0.2 (b) 5 = 0.3, e = 0.4 (c) s = 0.5, e = 0.2 

Fig. 5. Suppose the source distribution P{x^y) is DSBS(e), and the target distribution is Q(u,u) specified by Q{U = 1) = s, Q(V = 
l\U = 0) = c,Q{V = 0\U = 1) = d. The plots above show restrictions on the space of distributions (s,c, cf) that can be simulated. The 
X co-ordinate represents c and the Y co-ordinate represents d. In each plot, we fix s, e as specifed and p = 1.5. The blue region indicates 
Pm ^ ^ (1 ~ 2e)^, the green region indicates ^ (1 — 2e)^ < Sp and finally, the red region indicates (1 — 2e)^ < p^ < Sp. Thus, 

with p = 1.5, the red region is ruled out as impossible by pm and Sp, the green region is ruled out by Sp, and the blue region is ruled out 
by neither pm nor by Sp. Note that this does not mean all points in the blue region can be simulated by suitable choice of functions, only 
that our tools (using this particular choice of p) fail to prove impossibility for those points. Note that along the c = d line, ([/, V) is a DSBS 
source as well, so both maximal correlation and hypercontractivity (for any p) give an impossibility result if and only if c < e or c > 1 — e in 
accordance with Sec. ITaI 


. ||E[ 5 (F)|X]||p<|| 5 (r)||, 

. Ef{X)g{Y)<\\f{X)\\MY)\\, yf:X^Ryg:y^R. 

Similarly, for 1 > <7 > p, we have (p, q) G 7Z(X; Y) if either of the two following equivalent conditions hold: 

. ||E[ 5 (r)|X]||p>|| 5 (r)||, Vp:V^K>o; 

. Ef{X)g{Y)>\\f{X)\\p,\\g{Y)\\, V/: A” ^ M>o,Vp : V ^ M>o. 

We can define a Holder-contraction region ^{X^Y) by observing how much Holder’s inequality and the reverse 
Holder’s inequality may be tightened. Define (pi,P 2 ) G ’H(X; V) if 

• Pi,P 2 > 1, and yf :X ^ K,Vp : V K, we have Ef{X)g{Y) < \\f{X)\\p^\\g(Y)\\p^; 

• Pi,P 2 < 1, and yf : X ^ K>o,Vp : V K>o, we have Ef{X)g{Y) > \\f lx)\\p^\\g{Y)\\p^. 

This prompts a natural extension to fc-random variables using the fc-random variable Holder inequalities. The most 
general Holder and reverse Holder inequalities for k random variables are respectively given by: 

k 

EUtiW,<Tlti\m\\p„ p, = (36) 

i=i 

k 

EUtiW,>Uti\\W^\\p^, p, <l,K^0,exactlyonep, >0,^- = !,!^, >0. (37) 

i=i 

Proof of Holder and reverse Holder inequalities. By the weighted arithmetic mean-geometric mean inequality, we 
have for any real numbers pi, pa,. - ■, p/c > 0 , and pi,p 2 ,... ,Pfe > 1 satisfying ^ = 1 ’ 

^ Pi 

ntiP. < E — • (38) 

p^ 

Setting Pi = |||.^|‘|^ and taking expectations gives the Holder inequality. 

Now, if 0 < Pi < l,p 2 ,P 3 ,... ,Pfc < 0 satisfying ^ we may set qi = X,q, = = 2,3,...,k, 

so that Pi > 1 and ^ Using ( [38] l with pi’s, we get 

k 

ntiP. < pipr + E (39) 

■r, Pi 
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For any xi, X 2 ,. ■ -, Xfc > 0, choose yi = and yi = 

k 

X] ^ S 117-1 X, 


for i = 2, 3,..., A:, to get 


i=l 




( 40 ) 


Setting Xi = |||^‘|^ and taking expectations proves the reverse Holder inequality for > 0 almost surely, 


i = 1, 2,..., fc. If Wi > 0, we can set W'^ =Wi + e and let e 0 to complete the proof. 


□ 


Remark 3. Both Holder and reverse Holder inequalities can also be proved by recursively invoking the inequalities 
for two variables. As a demonstration, hx any 0 < p, g < 1. For any non-negative real-valued Wi, W 2 , W 3 , 

> ||W^iW^2|Ip|VF3|^ 

1-p 

= (E(kFiVF2)'’)^ iFFal^ 


> 




-P 
— P 


= I|h^iIUI|vf2||^||w^3IL 


It is easy to check that any reverse Holder inequality may be obtained in this way by suitable choice of p, q. 
Remark 4. The reverse Holder inequality will also hold if some of the pi were equal to zero as long as the point 
{pi,P 2 , ■ ■ ■ ,Pk) is the limit of points satisfying pi < l,Pi ^ 0, exactly one pi > 0, X^iLi — = 1- In particular, if 

we set for any integer M > 1 , p[^'’ = = pi^^ = ...= p[^^ = then {p[^\p 2 ,^^^ ,... ,pi^^) 

is a legitimate choice for the reverse Holder’s inequality. Taking the limit as M —> 00 , we get the inequality 
EHjLj^VFj > n^^^llFFillo, which is also valid for all random variables Wi > 0 and is a reverse Holder’s inequality. 
Remark 5. The restriction in reverse Holder inequality that exactly one pi > 0 is necessary. If no such pi exists, then 
the inequality is a consequence of EH^^j^Wi > n^^]^||VFi||o in the previous remark and the montonicity of norms. 
On the other hand, if more than one such pi exists, say pi,P 2 > 0, then we can choose any mutually exclusive 
events A, B such that P{A n B) = 0, P{A) > 0, P{B) > 0. Set Wi = U, 4^2 = 1b, W 3 = W 4 = ..., VFfc = 1. 
The reverse Holder inequality, if true, would then yield P(AnB) > P(A)P'P(B)^ which is false. 

Dehne (pi,p 2 , •.. ,Pk) G 'H{Xi;X 2 ] ■ ■ ■]Xk) if 
• pi,P 2 , ■ • • ,Pfe > 1, and V/i : Tfi !->■ M, * = 1, 2,..., A: we have 

Enti/.(x,)<ntill/.(x,: 


\Pi ? 


Pi,P 2 ,... ,pfe < 1, and V/i : Xi^ 


^> 0 , i = 1, 2,..., A; we have 

Enti/,(A,)>ntill/,(x,: 


Remark 6. The restriction to the orthant pi, p 2 ,..., pfe > 1 for the forward Holder contraction is without loss of 
generality: Assuming Xi is a non-constant random variable and fi is chosen so that fi{Xi) is non-constant and 
f 2 , fs, ■ ■ ■, fk are chosen to be constants, the inequality will hold only if pi > 1. Likewise, the restriction to the 
orthant pi,p 2 ,... ,pfc < 1, for the reverse Holder contraction is without loss of generality. 

It is easy to check that tensorization, data processing and appropriate semi-continuity properties continue to hold 
for 'H{Xi;X 2 ', ...; Xk) so we have the following observation. 


Observation 3. Non-interactive simulation of {U 1 X 2 , ■■■ ,Uk) ~ Q(wi, M 2 , ■ ■ ■ ,Uk) using 
(Xi,X 2 , ..., Xk) ^ P{xi,X 2 -, ■ ■ ■ ,Xk) is possible only if, for all non-empty subsets 81 , 82 , ■■■, 8 m L 

{l,2,...,k},H{Xs,-,Xs,-,...-,XsJCH{fJs,XsA---XsJ- 

Similarly, using maximal correlation, we can make the following observation: 

Observation 4. Non-interactive simulation of {Ui,U 2 , ■■■ ,Uk) ~ Q{ui,U 2 ,... ,Uk) using 
(Xi, X 2 ,..., Xk) ~ P{xi, X 2 ,. •., Xk) is possible only if for all non-empty subsets 81,82 C 
{1,2,..., A:}, we have p^(-¥s^;XsJ > PmiUs^Xsf). 

Example 4. We define the following distributions of DSBS triples as shown in Fig. For chosen 0 < ex, ey, ez < 
I, we dehne {X,Y,Z) ~ DSBS-triple(ex, ey, e^) as the unique triple joint distribution satisfying {Y,Z) ^ 
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DSBS(ej(:), (X, F) ~ DSBS(e 2 ), (X, Z) ~ DSBS(ey) (note that there are two such distributions if ex = = 

ez = !)■ Such a distribution exists as long as the triangle inequalities ex + ey > ez,ex + ez > ey, ex + ey > ex 
are satished and the joint distribution of (X, Y, Z) is given by: 





Fig. 6. (X, y, Z) ~ DSBS-triple(ex, ey, ex) 


Px,Y,ziO, 0 , 0 ) = Px,Y,z{^, 1 , 1 ) = 


2 — ex — ey — ex 


Px,Y,ziO, 0,1) = ^x,y,z(l, 1,0) = 
^x,y,z(0) 1) 0) = Px,Y,z{^, 0,1) = 
Px,Y,z{0, 1, 1) = Px,Y,z{^, 0,0) = 


ex + ey — ex 


ex — ey + ex 


—ex + ey + ex 


If either ^ or i? is binary-valued, then one can simply write HD 


,(A; -B) — -1 + 


a,b 


PA,B{a,hY 

PA{a)pB{b)' 


Using this simple formula, we hnd that the various maximal correlation terms for {X, Y, Z) 
are given by: 


(41) 

(42) 

(43) 

(44) 

(45) 

DSBS-triple(ex, ey, ex) 


p„(X;F) = l-2ex, 

(Y V Y\ li^Y-ez)'^ (1-ey-ex)^ 

Pm[X]Y,Z} = W-h -^-. 

V ex 1 - ex 


(46) 

(47) 


Now, consider the following three-agent non-interactive simulation problem. Agents Alice, Bob, and Charlie 
observe X”,F",Z” respectively and output (as a function of their observations and their private randomness) 
U, V, W respectively, which is required to be close in total variation to the target distribution {U, V, W) as shown 
in Fig. 1^ 

Suppose that for some e < h the source and target distributions are specified by (X, Y, Z) ~ DSBS-triple(e, e, e) 
and (U, V, W) ~ DSBS-triple(e, 2e(l — e), e) as shown in Fig. In Section I-A we pointed out that for a two- 
agent problem, non-interactive simulation of a DSBS target distribution with parameter /?<i using a DSBS source 
distribution with parameter a < 4 is possible if and only if the target distribution is more noisy, i.e. a < j3. Thus, 
for this example, each pair of agents can perform the marginal pair simulation desired of them. However, the three 
agents cannot simulate the desired triple joint distribution. 

Using the formula (|47]l, we get 


pUX,Z-Y) 

pUU,W;V) 


1 - 2e 
1 - 2e 

Vl - 2e 4- 2e? ’ 


(48) 

(49) 
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Fig. 7. Three-user non-interactive simulation problem 


^ u 




Fig. 8. Three random variable simulation example: Every pair of agents can achieve the desired simulation but the triple cannot. 


For 0 < e < we have 1 — 2e + 2e^ < 1 — e, which gives Pm{X, Z; Y) < Pm{U, W; V). This shows that even 
if agents Alice and Charlie were to combine their observations and their random variable generation tasks to form 
one agent Alice-Charlie, then Alice-Charlie and Bob cannot achieve the desired non-interactive simulation. 
Example 5. Consider the following choices of source distribution P{x,y,z) and target distribution Q{u,v,w). 


where 


P{x,y,z) 


ao if (x,y,z) = ( 0 , 0 , 0 ), 

02 if (x,y,z) = ( 0 , 1 , 1 ), ( 1 , 0 , 1 ), ( 1 , 1 , 0 ), 


Oq T 302 — 1) 

i.e. (X, y, Z) take values on the 4 sequences that satisfy X (B Y (B Z = 0 (addition modulo 2). 


r&o if (■u,u,w) = ( 0 , 0 , 0 ), 

I &i if ( 0 , 0 , w) = ( 0 , 0 , 1 ), ( 0 , 1 , 0 ), ( 1 , 0 , 0 ), 

U[u, v.w) = < 

]b2 if (u,u,u;) = ( 0 , 1 , 1 ), ( 1 , 0 , 1 ), ( 1 , 1 , 0 ), 
if (u,?;,?!)) = ( 1 , 1 , 1 ). 

We will choose these parameters so that for some 0 < 7 < 1, we have 


(50) 


feo + &i = Oo + 2 o27 + 027 ^, (51) 

5i+&2=02(1-7"), (52) 

b2 + b3= 02(1 - 7 )^ . (53) 


Consider the question of whether (U, V, W) can be simulated from {X, Y, Z). For simulation of pair ([/, V) from 
{X,Y), note that if Ai,A 2 ~ Ber( 7 ) i.i.d. and mutually independent of {X,Y), then 

(X 0 (Ai • lx=i),Y 0 {A 2 ■ ly=i)) = ([/, V) 
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1.0 



f 

(a) p = 1.95 


1 1.00 
0.95 
0.90 
- 0.85 
- 0.80 
- 0.75 
- 0.70 
- 0.65 

I 0.60 



f 

(b) p = 1.85 


1 1.00 
0.95 
- 0.90 
- 0.85 
- 0.80 
- 0.75 
- 0.70 
I 0.65 


Fig. 9. Contour plots of the ratio ^ where f{x) = (1 + /)la:=i + (1 - /)l 2 ;=o and g(y) = (l + g)lj,=i + (1 - g)ly=o- 

The X-axis represents the variable / E [—1,1] and the Y-axis represents the variable g E [—1,1]. We see numerically that for p = 1.95, the 
ratio is upper bounded by 1 eveiywhere, but for p = 1.85, the ratio is maximized at f = g = —1 where it takes the value 1.0088... This 
implies that (1.95,1.95,1.95) E n{X; Y; Z) but (1.85,1.85,1.85) ^ n{X] Y; Z). 


because of conditions ( |5T] l, ( |52l l, ( |53| l. By symmetry then, every pair of agents can achieve the desired simulation. 

Now, if we imagine two agents observe {X, Y) and Z respectively and are required to simulate (C7, V) and W 
respectively, then again this is possible since {X,Y) uniquely determines Z, so the agents now have access to 
shared randomness which can be used to generate any required joint distribution. 

However, consider the specihc choice: 

tto = 0.825,7 = 0.2, 6 o = 0.8, 

so that the other parameters are hxed from ( |50l l, ( [5T] i, ( |52l i, ( [5^ to be: 

02 = 0.058333..., 6 i = 0.0506666..., 62 = 0.005333..., 63 = 0.032 . 


Here, we find computationally that 

K := inf{p > 1 : {p,p,p) S HiX'jY ; Z)} = 1.93...; 

C := inf{p > 1 : (p,p,p) G ^(C/; V; W)} = 2.07. 


(54) 

(55) 


We present numerical evidence supporting the above claims. Specihcally, we will show that 1.85 < k < 1.95 
and C > 2.05. 

Using Holder’s inequality, it is easy to verify that the following two statements are equivalent: 


EfiX)g{Y)hiZ) <\\f{X)\\j,\\giY)\\j,\\h{Z)\\j„ 'if ■. X ^ 
\\E[f{X)g{Y)\Z]\\p, <\\f{X)\\pMY)\\p, V/ : A" ^ : V ■ 


,9--y 


,h:Z 


(56) 

(57) 


and furthermore, equivalently, all functions above may have co-domain M>o. We choose f{x) = (l-l-/)la;=r + (l — 
/)la;=o and g{y) = (1 + 9)^v^ + (1 ~ 5 )ly=o- It suffices to consider functions of this form since the inequalities 


above are homogeneous. Fig. 


shows contour plots of the ratio 


mf{X)g{Y)\Z]\L 


where the X-axis represents the 


ll/(-^)llpll9(’t')||p 

variable / G [—1,1] and the Y^axis represents the variable g G [—1,1]. For p = 1.95, the ratio is upper-bounded 
by 1, whereas for p = 1.85, the ratio takes the value 1.0088... at f = g = —1. (Note that the color bar in Fig. 
has a maximum value of 1.0 for p = 1.95 and a maximum value of a little greater than 1.0 for p = 1.85.) Thus, 
(1.95,1.95,1.95) G n{X; V; Z) but (1.85,1.85,1.85) ^ HiX; Y; Z) and so, 1.85 < k < 1.95. 

Now, consider the function 5{9) = 9 • \e=i + le=o- Then, 


ES{U)S{V)S{W) = 26.792 (58) 

||^(C/)||2.o5||W||2.o5||WI|2.05 = (I|<J(C7)|I 2 . 05 )" = (2.9747.. .)3 = 26.322... < 26.792. (59) 

This proves that (2.05,2.05, 2.05) ^ niU; V; W) and so, C > 2.05. 

Since k < 1.95 and ( > 2.05, the inclusion 7f(X;y;X) C 'H{U]V;W) is false and so, the simulation of 
{U,V,W) from (X,Y,Z) is impossible. 
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Appendix 

A. Proof of the claimed properties of pm 

In this subsection, we prove the claimed properties of maximal correlation. 

• (data processing inequality) For any functions </>,'!/') Pm{X-,Y) > pm{4'{X),il}{Y)). 

Proof: This is straightforward from the dehnition of pm- 

. (tensorization) If {Xi,Yi) and (X 2 ,Y 2 ) are independent, then -^^ 2 ; Ti, F 2 ) = max{pm(Xi; Yi), ^^ 2 )}- 

Proof: This property was shown by Witsenhausen CD- The following exposition of Witsenhausen’s proof is 


by Kumar |36|. If we define \X\ x |3^| matrices P,Q by = P{x,y) and Qx,y = 


P(x,y) 

yfp(x)p{y) ’ 


then the 


top two singular values of Q are ai{Q) — 1 and CT 2 (Q) = Pm{X]Y) (for proof, see |36|). The tensorization 
property then follows from the fact that the singular values of the tensor product of two matrices A 0 B are 
given by ai{A)aj{B). 

(Lower semi-continuity) If the space of probability distributions on A x 3^ is endowed with the total variation 
distance metric, then pm{X;Y) is a lower semi-continuous function of the joint distribution P{x,y). 

Proof: Suppose (X, Y), (Xi, Yi), (X 2 , Y 2 ),... are random variable pairs taking values in the hnite set X x 
satisfying dTv{{Xn,Yn); {X,Y)) —>■ 0 as n —)■ c». We will show that p := liminf„_>oo Pm(Ar„; Y„) > 
PvTi^Xi^ Let ^ subsequence so that p — lim^ —^00 Pth 

For any e > 0, there exists a j(e) such that pm{Xj^]Yj^) < p + e for all > j(e). Fix any functions 
f : X ^ R,g : y ^ R such that E/(X) = Eg{Yf= 0 and Ef{X)^,Eg(Y)^ < 1. We will show 
E/(X) 5 (Y) < p which will complete the proof. 

If E/(X)^ = 0 or Ep(Y)^ = 0, there is nothing to prove. So, suppose E/(X)^, Ep(Y)^ > 0. Since X y.y 
is a hnite set, dTv{{Xj^,Yj^)- {X,Y)) —^ 0 implies that Var(/(Xj^)) —>• Var(/(X)) > 0, Var(p(Y,;^)) —>• 
Var(p(Y)) > 0. There exists j{f,g) such that Var(/(Xj^)) > 'YEiPfQl^yg^j;{g(Yj^)) > Jqj. 

j > n{f,g). 

Dehne for j„ > max{j(e), j(/, p)} the functions fj^ 


X K, gj^ : 3^ I—>■ M given by 


fjAx) = 

gjjy) 


/(X,-J-E/(X,-J 

v/Var(/(X,J) 

VVar(5(Y,J) 


(60) 

( 61 ) 


which is possible since for such jn we have Var(/(Xj^)), Var(g(T);^)) > 0. 


Again, we will have Efj^{X)gj^{Y) 


Ef{X)giY) 


> Ef{X)g(Y). But by dehnition, we have for 


yyjnV ) y/w.f{xp^g{Yp 
jn > max{j(e),j(/, 5 )} that E/,-„(X)p^-„(Y) < p„,{Xj^]YjJ < p + e. This gives Ef{X)g(Y) < p + e. 
Since e > 0 was arbitrary, we have Ef{X)g{Y) < p. 


B. Proof of the claimed properties of Sp 

In this subsection, we prove the claimed properties of Sp for p ^ 1. 

• (data processing inequality) For any functions Sp(X; Y) > Sp{(j>{X);'tfj(Y)). 

Proof: Let W = (j){X)^Z = i/’(Y). Suppose for 1 < g < p, we have ||E[p(Y)|X]||p < ||p(Y)||^ for all 
functions p : X 1 —)■ K. For any function of Z, say 9{Z), we have 


\\E[e{z)\w]\\p = \\E[e{^{YyMx)]\\p ( 62 ) 

^=^||E[E[0(V;(Y))|X]|</>(X)]||p (63) 

< ||E[0(i/-(Y))|X]||p (64) 

< mi’iymg ( 65 ) 

= mz)\U, (66) 


where (a) follows from successive conditioning and (b) follows from Jensen’s inequality; ||E[A|(^(X)]||p < 
||A||p. Similarly, we can deal with the case 1 > g > p. This completes the proof. 

• (tensorization) If (Xi, Yi) and (X 2 , Y 2 ) are independent, then Sp(Xi, X 2 ; Yi, Y 2 ) = max{sp(Xi; Yi), Sp(X 2 ; Y 2 )}. 
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Proof: Suppose {Xi,Yi) ^ Pi{xi,yi) and (^2,^2) ^ ^2(2:2,2/2) are both (p, g)-hypercontractive, with 
p < l,p ^ 0. We remark that for the case of p = 0, we take limits in the standard way. Then, 

E/(Xi)5(ri)> ||/(Xi)||p,||s(Yi)||, V/: V 5 : ^ K>o; (67) 

E/(X2)5(r2)> ||/(X2)||p-||5(i^2)||, V/: A’2^K>o, V 5 : 3^2 ^ K>o- (68) 

Now, hx any positive-valued functions f : Xi x X 2 >—>■ R>o,5 : 3^i x 3^2 '—>■ R>o- 


^fiXi,X2)g{Yi,Y2) = Pi{xi,yi) Y P2{x2,y2)f{xi,X2)g{yi,y2) 


xi,yi 


3^2, 2/2 


(a) 


(69) 


> Y Piixi,yi) ^Px2(a;2)/(a;i,a;2F' 5I^^2(2/2)5'(yi,2/2)'^ (70) 


(b) 


> X! X! (2/2)5'(2/i, 2/2)^ (71) 


= ||/(Xi,X2)||pHl5(n,r2)||„ 


(72) 


where (a) follows from ( |68l l and (b) follows from ( |67| ). This means {{Xi,X 2 ), {Yi, T2)) is (p, (2)-hypercontractive. 
It is easy to see that if one of (TfijYi) or (262,12) is not (p, g)-hypercontractive, then ((26i, 262), (li, I2)) is 
not (p, g)-hypercontractive. Thus, 


which gives 


g;(26i,262;13,r2) = min{g;(26i;yi),(7;(262;y2)}, 

Sp(26i,262;ri,y2) = niax{sp(26i;yi),sp(262;y2)}. 


For p > 1, the proof is similar; in this case, we hnd 

(z;(26i,262;13,r2) = max{(7;(Xi;yi),g;(262;r2)}, 

and 

Sp(26i,262;n,y2) = niax{sp(26i;yi),sp(262;y2)}. 

• (lower semi-continuity) If the space of probability distributions on A” x 3^ is endowed with the total variation 
distance metric, then Sp(26;y) is a lower semi-continuous function of the joint distribution P{x,y). 

Proof: Let us hx p < 1. An identical proof holds for the case of p > 1. Suppose (26, F), (26i, Yi), (262,12)1 ■ 

are random variable pairs taking values in the hnite set A" x 3^ satisfying dTv((26nj In); (26, F)) —>■ 0 as 
n —)■ 00. Let s := liminf„_>oo Sp(26„; l)i) > 0. We will show that s > Sp{X-Y). Let {j „}-1 be a 
subsequence so that s = lim„_>oo Sp{Xj^;Yj^). 

We may assume without loss of generality that s < 1. For any e > 0, there exists a j{e) such that 
Sp{Xj^;Yj^) < s + e for all > j{e). We would like to show Sp{X-,Y) < s, i.e., that for any functions 
/ : A 2 I—>■ K>o, g ■ y K> 0 ) the following holds; 


Ef{X)g{Y) > ||/(26)||p,||5(l^)||i+s(p-i). (73) 

For any given functions / : A" 1 —)■ M>o, g ■ Y R> 0 i and any jn > j(e), we have from Sp{Xj^; Yj^) < s + e 
that for jn > n{e), 

Ef{X,JgiY,J > 11/(26,J||p,||22(i",J||i+(.+.)(p-i). (74) 

From the portmanteau lemma pT] , we get 

E/( 26 ) 5 (r) > ||/(26)||p,||p(y)||i+(,+,)(p_i). (75) 

Since this is true for each e > 0, we get from continuity of ||.||q in q that 

Ef{X)g{Y) > ||/(26)||p,||5(l^)||i+s(p-i). (76) 

Since this is true for any functions / : A” 1 —M>o, g ■ Y R> 0 j we have Sp(26; Y) < s. 

Remark 1. Note that this implies that qp[X\Y) = 1-1- Sp(26; Y){p — 1) is lower semi-continuous in the joint 
distribution for hxed p > 1 and upper semi-continuous in the joint distribution for hxed p < 1. 
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Remark 8. Lower semi-continuity of pm and Sp was enough for our purposes. Indeed, p 
continuous in the underlying joint distribution. As an example, let be binary-valued and have a 


and Sp are not 


joint probability distribution given by 


probability distribution given by 


i 0 

n 

0 i-h 


. Then, {Xn,Yn) —?> iX,Y) where {X,Y) has a joint 
. But pm{Xn',Yn) = Sp{Xn]Yn) = 1 for each n and each p ^ 1, 


while pm{X-Y) = Sp{X-Y)=Q. 
However, it may be shown that if {X,Y) 


P{x,y) satisfies the assumption P{x) >0 Vx G X,P{y) > 


0 yy G y, then {X,Y) implies lim^^oo= Pm{X;Y). To see this, use the 

characterization Pm(X; Y) = a 2 iAx-Y), where the matrix Ax y is specified by \Ax y\x v = and 

’ _ _ ’ ’ ’ \/P{x)P{y) 

<J 2 (-) is the second largest singular value Jl5) , Under the assumption, 
largest singular value is a continuous matrix functional. 


A 


X:Y 


and the second 


C. Limiting properties of Spi Proofs of Thm. and Corollary 

As in we define for any non-negative random variable X, the function EntjAT) := E[XlogX] — E[X] • 
logE[X], where by convention OlogO := 0. By strict convexity of the function x i—> a; log a: and Jensen’s inequality, 
we get that Ent(A') > 0 and equality holds if and only if AT is a constant almost surely. Also, we note that Ent(-) 
is homogenous, that is, EntjaAT) = aEnt(X) for any a > 0. 

We begin by presenting first a simple lemma. 

Lemma 1. For any random variable Z satisfying 0 < Z < K for some constant K > 0 and E,Z = 1 and 
0 < at < 1, we have 

1 + uF,nt{Z) - u^Li{K) < ||^||i+„ < 1 -f uEnt(Z) -f u^Lo{K), (77) 

where Lq{K) = ^ max{7T“, l}maxo<z<x ^(logz)^ antf Li(iT) = (maxo<z<x |-zlogz|)-l-5(maxo<2<x |^logz|)^. 
Proof of Lemma^ For any constant 0 < at < 1 and any 0 G K, a Taylor’s series expansion yields 

,,2 

<l + ue+ —6^ max{e“®, 1} . 

Thus, for any Q < z < K for some constant K > 0, and 0 < at < 1, we have using 

z + UZ log z < < z -I- at 2 ; log z -I- —z(log z)^ max{z“, 1} . 

For any random variable Z satisfying 0 < Z < K almost surely and any 0 < at < 1, 

¥.Z + vE[Z log Z] < E[Z^+“] < EZ + uE[Z\ogZ] + — max{K^, 1}HZ{log Zf] 

<EZ + uE[Z log Z] -f u^Lo{K) . (78) 

Now, again a Taylor’s expansion yields that for 0 < r < 1 and any a; > 0, we have 


a;2 

1 -I- ra;-— f) < (1 -f x)’’ < 1 -f rx . (79) 

Suppose Z is any random variable that satisfies 0 < Z < K and EZ = 1. Then E[Zlog.Z] = Ent(.Z) > 0. For 
any 0 < at < 1, we get using the lower bounds in both ( |78] l and ( |79l l with the choice r = and x = atEnt(Z), 


1 , . ai^Ent(Z)^ 1 

1-f atEnt(Z)- ’ 


< {E[Z^+^]) 


l-fat 2 1 + atl-t-at 

Similarly, using the upper bounds in both ( |78l l and ( |79l l with the choice r = and x = atEnt(Z) + vf Lq{K), 
we get 

1 , , 1 


(E[Zi+“]) < 1 -h 


1 -I- at 


Ent(Z) + ^—aa"Lo(iT) . 
1 -I- at 
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Putting the above two inequalities together, 
1 + ^-—uEnt(Z)- ^ ’ 


XL 11 

< ll-^lli+u < 1 + —MEnt(Z) ■■ 


1 + rt ' ' 2 1 + ul+u " ' 1 + It ' ' ' 1 + u ^ 

Define L 2 {K) = maxo<2<if |zlogz| and observing that for 0 < m < 1, we have ^ < 1, we obtain 


1 + 


< 1 + 

1 + U 2 1 + M 


+ u^Lq{K). 


Further using the fact that for 0 < rt < 1, we have 1 — m < < 1, we get 


Ub 

1 + uEnt{Z) - u‘^L 2 {K) - —L 2 {Kf < \\Z\\i+u < 1 + uEnt{Z) + LoiK). 

Finally, since Li{K) = L 2 {K) + ^L 2 {K)^ and u < 1, we have 

l + uEnt(Z)< ||Z||i+„ < 1 + uEntiZ) + Lo{K). (80) 

□ 


Next, we present the proof of Thm. 

Proof of Theorem . The theorem is easily seen to be true when E is a constant almost surely. We assume then 
that this is not the case and that Pyiu) > 0 for all y € 3^ and Px{x) > 0 for all x € ^ without loss of generality. 
Define s := sup ’ where the supremum is taken over functions 5 : 3^ i—> K>o such that g(Y) is not a 

constant almost surely. 

For any distribution Ryiv) ^ Py{y) consider the non-constant non-negative valued function g given by g{y) := 
This choice yields Ent(5(E)) = D{Ry{y)\\Py{y)) and Ent(E[p(E)|X]) = D{Rx{x)\\Px{x))), where 

R-v(x) = E. Ry{y). Along with homogeneity of Ent(-), this means that s = s*{Y]X) and thus, from 

the data processing inequality 0 < s < 1. 

For non-negative 5, we always have 


||E[g(E)|X]||i = ||5(E)||i Vp:3^^K>o. 


(81) 


Let Q be the set of all non-negative functions g '.y ^ K>o that satisfy ||p(E)||i = 1. Note that for any g G G, 
both g{Y) and E[p(y)|A'] are bounded between 0 and K := W(y) ^^tnost surely. 

If 0 < m < 1 is any parameter satisfying m < s, then (1 -f r, 1 + mr) ^ 7?.(X; Y) for all sufficiently small 
T > 0. To see this, fix go to be any function in G that satisfies 


Ent(E[po(E)|^]) 

Ent( 5 o(E)) -™+2’ 

where 5 := s — m. From Lemma [T] we have that for any g G G, 

1 -f TOrEnt(p(F)) - < \\g(Y)\\i+rnr < 1 + mTEnt{g(Y)) + m'^T^Lo{K), 

1 + TEnt(E[g{Y)\X]) - t^Li{K) < ||E[ 5 (r)|X]||i+, < 1 + TEnt(E[p(E)|X]) + r^LoW- 


(82) 


(83) 

(84) 


Putting together @, (Hg, 0, we get the existence of tq > 0 such that 


||E[5o(E)|X]||i+. >||5o(E)|| 


Thus, s = s*(E; AT) > limsupp_j,i+ Sp{X-,Y) = limsupp_j.i+ 

If for some 0 < m < 1 we have m > s, then define for any g G G- 


l+rar Vt : 0 < T < Tq. 
ql(X-y)-l 


-1 


(85) 


T{g) := max{C : 0 < C < 1, ||E[p(E)|X]||i+^ < \\g{Y)\\x+^p for all 0 < 77 < (}• 

From ( |8 T] i, we have r(p) > 0 for all g G G- 

Let gi G G denote the constant function 1. Then, t(pi) = 1. Lemma below shows that there is an open 
neighborhood U of gi in G and a constant tq > 0 such that T{g) > tq Vg G U. 

Over the compact set G\U, we define 

r'(p) := max{C : 0 < C < 1, l+?7Ent(E[p(E)|X])-|-?7^Lo(iT) < l-|-m77Ent(5(E))—TO^r7^Li(Ar) for all 0 < p < C}- 
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Then, r'(g) < T{g) from Lemma [T] And indeed, 

f TOEnt(g(y)) - Ent(E[g(y)|X]) 

I Lo{K) + w?Li{K) 

Since T'{g) is continuous in g over Q \ U, and furthermore strictly positive over that set (since m > s and 
because Ent(p(y)) > 0 for p non-constant), we have that t' attains its infimum over the compact set Q\U. Since 
T'{g) < T{g), we also have that infggp\(7'’’(s) > 0. 

Then, infggp r(5) = min {tq, infgg5\^y r(p)} > 0. Using homogeneity of the norm, this establishes that (1 + 
T, 1 Amr) G Tl{X\Y) for all 0 < r < tq for some tq > 0 and thus, that s = s*iY ; X) < liminfp_j.i+ Sp(X; Y) = 


lim infp_j,x+ 


q;{X;Y)-l 

p-1 


lim, 


Therefore, s = s*{Y]X) = limp_j.i+ Sp(X;y) = limp_j.i+ 

Similarly, we can show the reverse hypercontractive case namely, that s = s*(y;X) = limp_^i- Sp(X;y) = 

q;{X-Y)-l 


p—^-l” 


p-1 


This completes the proof of the theorem. 


Lemma 2. When 1 > m > s, there exists an open neighborhood U of the constant function gi in Q and a constant 
To > 0 such that T{g) > tq for all g & U. 

Proof of Lemma^ Let X denote the set of all functions / : 3^ i—)■ K such that E[/(y)] = 0 and E[/(y)^] = 1. 
For any / G X, and any y G 3^, we have \f{y)\ < — - \ . 

For 0 < eo < 5 min^ y/PyJy), the set U(eo) := {gi + e/:/GX, 0<e< eo} is an open neighborhood of the 
constant function gi in Q. Furthermore, 5 < gijj) < f for all y G 3^ and all g G U(eo). 

Let TO = (1 + 5)s where s < 1 and to < 1 and where <5 > 0. 

For g G G, denote Xgi^) = ®[p(y)|X = x] and note that f < Xg{^) ^ | for all x G A. 

Now, for 0 < ry < 1, 


|l5(y)|li+„p= 


> g .dTl,,) Ent(E[g(y)|X]) 

> g(l-t-5)(T^^Ent(xg(Jr)) 


> ( 1 + 77(1 + (5) Ent(xg(X))-f-^(1 + (5) Ent(xg(X))" 


( 86 ) 

(87) 

( 88 ) 

(89) 

(90) 


where ( |87l l follows from convexity of the exponential function. 
Hows from e“ > 1 
Likewise, we have 


follows from e“ > 1 + u + for m > 0. 


follows from the definition of s and 


|lE[p(y)|X]||i+p = 

^Px{x)xg{x) ("1 + rylogXg(a;) +ay(logXs(a;))^ 


< 


l+T? 


(91) 


(92) 


< 1 


?7Ent(xg(X)) + ^ Px(a;)Xg(a:)(logXg(a:))^ 


(93) 


where a > 1 is a constant such that e“ < 1 + w + a'^ for |m| < log2. 

Note that Ent(xg(X)) = D{Qx\\Px) where Qx{x) = Px{x)xg{x) for all x G A. By Pinsker’s inequality, 

Ent(xg(X)) > i i^^\Px{x)xg{x) - Px(a;)|^ . 


Thus, for all X G A, we have 
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1 


IX3(a:) - 1| < 


V2Ent(xg(X)). 


If we define for 0 < a < 1, the function «;(«) := maxi_Q<„<i+Q, t;(logw)^, where K{a) —>■ 0 as a —?► 0, then 
we have 


l|Eb(r)|X||U, < (l + ,En.fc(X)) + at. ^ («) 

Using ( |90l l and ( |94l i, we hnd that for any g G C/(eo), we have T{g) > /3(Ent(xg(X))) where 

if -(f+^)V<0 

else. 

Given any 9 > 0, there exists 0 < ei < eg small enough so that Ent(xg(-^)) < Ent(g(y)) < 9 for all g G U{ei). 
This means that for all g G (7(ei), we have r(g) > info<p<e /3(p). Since K(a) = + O(a^) for small a > 0, it 

follows that info<p<6( /3(p) > 0 for sufficiently small 9. This completes the proof of the lemma. 

□ 

□ 


^(P) := 


25p 




Z (a:) 


)_(l+5)2p2’ 


Now, we present the proof of Corollary 

Proof of Corollary^ If X and Y are independent, then it is clear that Pm{X] Y) = s*(X; U) = 0 and q*{X; Y) = 
1 for all p 7 ^ 1. The claim is obvious in this case. 

Suppose X and Y are not independent. Fix any e satisfying 0 < e < s*(F; Jf). Note that by Theorems and 
we have s*(y; X) = limp_>,i Sp{X]Y) > p^{X\Y) > 0. 

From Thm. we have that there exists a i5 > 0 such that 

0 < Ip - 1| < <5 ^ s*(y; X) - e < ~ < s*(j. X) + e. (95) 

p- 1 

Now, dehne 


A(e) := |(p, g) : 0 < Ip - 1| < (5, s*(r; X) + e < ^ < l| , (96) 

B{e) := |(p, <7) : 0 < Ip - 1| < (5, s*(r; X) - e < ^ < l| U {(1,1)} 

u |(p,g) : b- 1| > 5,pI,{X-Y) < < l| . (97) 

From ( |95| l and Thm. [T] it is clear that 

A{e)cn{X-Y)CB{e). (98) 

By using the duality {p,q) G Tl{X;Y) (q',p') G TZ{Y;X) for p, g b 1, we obtain 

24l(e)C7^(y;X)CBl(e), (99) 

where 

^i(e) := |(P.9) : k-l| > s*(^;+ e < < l|, (100) 

Bi{e) := |(p, g) : k - 1| > s*(r; X) - e < ^ < l| U {(1,1)} 

u|(p,g):0<|g-l|< ^,p^(X;r)<^<l|. (101) 
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This immediately gives 

s* {Y; X)-e< lim inf - < lim sup - < s* (F; X) + e, (102) 

p-)—oo p— 1 p-^-oo p— 1 

q*(Y:X)-l q*(Y:X)-l 

s* (F; 2f) — e < lim inf — - < lim sup — - < s*(Y]X) + e. (103) 

p-)-oo p — 1 p-)-oo P — 1 

Since this is true for each sufficiently small e > 0, interchanging X and F completes the proof. 

□ 
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